Automatic Recognition of Composite Verb Forms in Serbian
نویسنده
چکیده
In this paper, we will present the work on building a shallow parser for recognizing composite verb forms in Serbian – the forms that consist of an auxiliary verb and a main verb. The parser is made in Unitex, a corpus processing software, in the form of local grammars that rely on using morphological dictionaries of Serbian. The model was tested on a small corpus of texts, both written in Serbian and translated into Serbian (total of 171 kw), in a few phases. In the current phase, the average result of 95,8% of well recognized units is achieved, with the translation of Jules Verne’s Around the world in 80 days giving the best results (98,8%), and a short story by Ivo Andrić, A Vacation in the South, giving the worst (91,7%).
منابع مشابه
Composite Tense Recognition and Tagging in Serbian
The technology of finite-state transducers is implemented to recognize, lemmatize and tag composite tenses in Serbian in a way that connects the auxiliary and main verb. The suggested approach uses a morphological electronic dictionary of simple words and appropriate local grammars.
متن کاملA Framework for Automatic Acquisition of Croatian and Serbian Verb Aspect from Corpora
Verb aspect is a grammatical and lexical category that encodes temporal unfolding and duration of events described by verbs. It is a potentially interesting source of information for various computational tasks, but has so far not been studied in much depth from the perspective of automatic processing. Slavic languages are particularly interesting in this respect, as they encode aspect through ...
متن کاملDimensionality Reduction and Improving the Performance of Automatic Modulation Classification using Genetic Programming (RESEARCH NOTE)
This paper shows how we can make advantage of using genetic programming in selection of suitable features for automatic modulation recognition. Automatic modulation recognition is one of the essential components of modern receivers. In this regard, selection of suitable features may significantly affect the performance of the process. Simulations were conducted with 5db and 10db SNRs. Test and ...
متن کاملLocal Grammars and Compound Verb Lemmatization in Serbo - Croatian
The increasing production of electronic (digital) texts (either on the Web or in other electronically available forms, such as digital libraries or archives) demands appropriate computer tools that can help human users in text manipulation and, additionally, in performing automatic processing of language resources. In the first place, a natural language processing (NLP) system needs to implemen...
متن کاملSpeech Technologies for Serbian and Kindred South Slavic Languages
This chapter will present the results of the research and development of speech technologies for Serbian and other kindred South Slavic languages used in five countries of the Western Balkans, carried out by the University of Novi Sad, Serbia in cooperation with the company AlfaNum. The first section will describe particularities of highly inflected languages (such as Serbian and other language...
متن کامل